本文源码请见我的GitHub
2.4 聚合:最小值、最大值、其他值
2.4.1 数组值求和
1 | import numpy as np |
2.8464551359447516
1 | np.sum(L)#np中的方法 |
2.8464551359447516
2.4.2 max & min
1 | big_array = np.random.rand(1000) |
164 µs ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.09 µs ± 41.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1 | %timeit min(big_array) |
68.9 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1 | max(big_array) |
0.998342996761799
1 | %timeit np.min(big_array) #np中的语法明显要更快一些 |
3.11 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1 | np.max(big_array) |
0.998342996761799
1 | '''另一种更简洁的语法是通过对象直接调用''' |
0.998342996761799 0.0013296091676711086
1.多维度聚合
1 | M = np.random.rand(3,4) |
[[0.17460472 0.8095875 0.98024377 0.58942287]
[0.16436885 0.47376126 0.27927504 0.55330698]
[0.1979097 0.3506765 0.48979371 0.07578097]]
1 | M.min(axis=0)#每一列的最小值 |
array([0.16436885, 0.3506765 , 0.27927504, 0.07578097])
1 | M.max(axis=1)#每一行的最大值 |
array([0.98024377, 0.55330698, 0.48979371])
2.4.3 demo:美国总统身高
1 | import pandas as pd |
1 | data = pd.read_csv("data/president_heights.csv") |
1 | heights = np.array(data['height(cm)']) |
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
177 185 188 188 182 185]
1 | #概括统计值 |
Mean height: 179.73809523809524
standard deviation: 6.931843442745892
Minimum height: 163
1 | #计算分位数: |
25th percentile: 174.25
Median: 182.0
75 percentile 183.0
1 | %matplotlib inline |
1 | plt.hist(heights) |